Abstract: Large number of data is available on internet today, these textual data constitutes resources. It is a difficult and important challenge to discover knowledge from textual database or for short text mining. The reason behind this is its richness and its ambiguity of natural language, which also affects analyzing of the data. Thus the question arises who is responsible to read and analyse the data? In this context, manual analysis and effective extraction of useful information may be possible. We think the solution is that it is relevant to provide automatic tools for analyzing large textual collections by automatically finding relevant Info. It depends on keyword features for discovering association rules amongst keywords and labelling the documents. In this work, system ignores the order in which the words occur, but instead it focuses on the words and their statistical distributions in documents. The main contributions of the technique are that to convert the document from unstructured to structured form with Information Retrieval scheme i.e. TF-IDF (for keyword/feature selection that automatically selects the most frequently occurred keywords to generate association rules) and use Data Mining technique for association rules discovery. The system requires Pre-processing phase such as transformation, stemming and indexing of the documents. Stemming is common requirement of natural processing function. The main purpose of stemming is to reduce different grammatical word forms of a word (noun, adjective, verb, adverb etc.) to its root form. Association Rule Mining (ARM) phase and Visualization phase i.e. visualization of results. The input is selected as static web pages related to road accidents on district level and its preventive measures.
Keywords: HARMT algorithm; Porter Stemming Algorithm; Text Preprocessing phase; Association Rule Mining Phase; network lifetime.